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Abstract 


Given  an  ordered  universe  V ,  we  study  the  problem  of  representing  each  subset  of  by  a  unique 
binary  search  tree  so  that  dictionary  operations  can  be  performed  efficiently.  We  exhibit  represen¬ 
tations  that  permit  the  execution  of  dictionary  operations  in  optimal  time  when  the  dictionary  is 
sufficiently  sparse  or  sufficiently  dense.  We  apply  unique  representations  to  obtain  efficient  data 
structures  for  maintaining  a  collection  of  sets/sequences  under  queries  that  test  the  equality  of  a 
pair  of  objects.  In  the  process,  we  devise  an  intererting  method  for  maintaining  a  dynamic,  sparse 
array. 


*This  work  wss  begun  when  the  author  was  at  Bellcore  and  continued  at  DIMACS  center  and  Courant  Institute. 
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1  Introduction 


The  unique  representation  problem  for  an  abstract  data  type  (e.g.  a  dictionary)  is  to  design  an  unique 
representation  for  values  of  the  abstract  data  type  by  values  of  a  concrete  data  type  (e.g.  a  binary 
search  tree)  so  that  operations  of  the  abstract  data  type  can  be  performed  efficiently.  More  concretely, 
consider  the  problem  of  uniquely  representing  a  dictionary  over  an  ordered  universe  by  a  binary  search 
tree.  The  cost  of  searching  for  an  item  in  the  dictionary  equals  its  depth  in  the  corresponding  binary 
search  tree.  The  cost  of  performing  an  update  operation  (insert  or  delete)  on  the  dictionary  is  the 
time  needed  to  transform  the  tree  corresponding  to  the  initial  set  into  the  tree  corresponding  to  the 
final  set.  Let  T(n)  denote  the  worst-case  cost  of  performing  a  dictionary  operation  on  an  n-element 
dictionary  under  this  representation.  The  problem  is  to  choose  a  representation  that  minimizes  T(n) 
for  all  values  of  n. 

The  unique  representation  problem  arises  in  the  contexts  of  incremental  evaluation  and  imple¬ 
mentation  of  high  level  programming  languages.  Incremental  evaluation  [11,12]  is  the  technique  of 
efficiently  updating  the  value  of  a  function  when  the  input  changes.  A  simple  idea  to  speed  up 
incremental  evaluation  is  to  remember  the  results  of  previous  function  calls  and  thereby  avoid  recom¬ 
putation.  If  the  input  domain  is  uniquely  represented  and  input  data  objects  are  constructed  using 
Cons  operations  [2],  then  it  becomes  easy  to  check  if  the  function  has  already  been  evaluated  on  a 
given  input.  Modern  programming  languages  such  as  SETL  support  high  level  data  types  such  as 
sets  and  sequences  and  permit  testing  whether  two  such  objects  are  equal  as  a  fundamental  operation. 
Equality- testing  can  be  implemented  in  constant  time  by  devising  an  unique  representation  for  the 
data  type  and  manipulating  concrete  data  values  through  Cons  operations. 

Before  proceeding  further,  we  need  to  introduce  some  basic  terminology.  A  dictionary  is  a  set  of 
items  selected  from  a  totally  ordered  universe  cm  which  membership  queries  and  update  operations 
that  insert  or  delete  items  can  be  performed.  We  can  represent  a  dictionary  by  a  binary  tree  containing 
one  item  per  node,  with  the  items  arranged  in  symmetric  order:  each  item  in  the  tree  is  larger  than 
the  items  in  its  left  subtree  and  smaller  than  the  items  in  its  right  subtree.  This  data  structure  is 
called  a  binary  search  tree.  A  binary  search  tree  is  represented  in  storage  as  a  collection  of  records, 
one  per  node,  with  a  pointer  to  the  record  cotmponding  to  the  root.  Each  record  has  a  key  field,  left 
and  right  children  pointers,  and  a  bounded  number  of  additional  information  and  pointer  fields.  A 
rotation  of  an  edge  [x,p]  in  a  binary  tree,  where  p  is  the  parent  of  x,  is  a  transformation  that  makes 
X  the  parent  of  p  by  transferring  one  of  the  subtrees  of  x  to  p.  See  Figure  1.  Rotations  are  useful  in 
updating  binary  search  trees  since  they  preserve  the  symmetric  order  of  the  items. 

Now,  we  survey  previous  work  on  unique  representations.  Snyder  [15]  considers  the  problem  of 
representing  subsets  of  an  ordered  universe  by  tree-based  search  structures  so  that  all  sets  of  equal 
cardinality  have  the  same  underlying  tree.  For  this  class  of  representations  he  showed  that  6(>/n) 
time  is  both  necessary  and  sufficient  to  perform  a  dictionary  operation,  where  n  is  the  dictionary 
size.  Munro  and  Suwanda  [10]  examine  how  to  implicitly  represent  subsets  of  an  ordered  universe  by 
enforcing,  for  each  set  size,  an  unique  partial  order  on  the  storage  locations.  They  showed  that  d(>/n) 
time  is  necessary  and  sufficient  to  perform  dictionary  operations  using  such  a  representation.  Recently, 
researchers  have  started  using  randomization  to  obtain  solutions  to  the  unique  representation  problem 
with  provably  good  averse  behaviour.  Pugh  [11]  and  Pugh  and  Tietelbaum  [12]  have  devised  efficient 
randomized  representations  for  sets,  sequences  and  other  abstract  data  types.  Aragon  and  Seidel  [3] 
show  how  to  uniquely  represent  a  dictionary  by  a  binary  search  tree  so  that  dictionary  operations  can 
be  performed  in  0(]Qg  n)  randomized  time. 

In  this  paper  we  study  the  deterministic  complexity  of  uniquely  representing  a  dictionary  by  a 
binary  search  tree.  We  consider  the  following  three  ways  of  updating  a  binary  search  tree  during  an 
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insertion  or  deletion: 

1.  Performing  a  sequence  of  rotations:  We  update  the  tree  using  the  traditional  algorithms  for 
binary  search  tree  insertion  and  deletion  [8],  but  permit  arbitrary  rotations  to  be  performed  on 
the  tree  before  and  after  so  that  the  final  tree  correctly  represents  the  updated  dictionary. 

2.  Changing  the  pointers  of  a  set  of  nodes. 

3.  Nondestructive  updating  through  Cons  operations:  At  any  time  during  the  sequence  of  dictio¬ 
nary  operations,  we  maintain  all  the  trees  created  so  far  and  their  subtrees.  An  update  operation 
constructs  a  new  tree  by  sharing  subtrees  from  existing  trees  and  creating  new  nodes  correspond¬ 
ing  to  nonexistent  subtrees.  A  Cons  operation  is  used  to  determine  whether  a  subtree  already 
exists,  given  its  left  and  right  subtrees  and  root  item,  and  create  the  subtree  if  necessary.  The 
cost  of  a  Cons  operation  is  assumed  to  be  constant.  Also,  in  anticipation  of  the  future,  the 
update  operation  may  create  any  number  of  additional  trees  through  Cons  operations. 

For  each  updating  method,  we  devise  representations  that  permit  efficient  implementation  of  the 
dictionary  using  that  method.  Culik  and  Wood  (4]  have  proved  that  the  number  of  rotations  needed 
to  transform  any  n-node  binary  tree  into  any  other  n-node  binary  tree  is  at  most  2n  -  2,  and  Sleator, 
Tarjan,  and  Thurston  [14]  have  improved  this  bound  to  2n  —  6,  for  n  >  12.  These  results  imply 
that  any  unique  representation  supports  dictionary  operations  in  0(n)  time  using  rotations.  Snyder's 
representation  [15]  requires  0(y/n)  time  per  dictionary  operation  and  uses  pointer  changes  to  update 
the  tree.  We  provide  a  representation  that  supports  dictionary  operations  in  0(^/n)  time  using  Cons 
operations. 

We  also  prove  lower  bounds  on  the  complexity  of  any  unique  representation  that  are  valid  if  the 
dictionary  is  sufficiently  sparse  (i.e.,  if  n,  the  dictionary  size,  is  small  in  relation  to  |(7|).  The  lower 
bounds  are  derived  for  the  foUowing  cost  metric.  The  cost  of  an  update  using  one  of  the  above  methods 
is,  respectively,  the  number  of  rotations,  the  number  of  pointer  changes,  and  the  number  of  newly 
created  nodes.  The  cost  of  searching  for  an  item  in  a  binary  search  tree  is  1  plus  the  depth  of  the 
item  in  the  tree.  Under  the  assumption  that  the  dictionary  is  sparse,  we  show  that  the  preceding 
upper  bounds  are  optimal.  In  contrast,  Aragon  and  Seidel’s  representation  requires  only  O(logn) 
randomized  time  per  dictionary  operation  using  any  of  the  three  updating  methods.  It  is  surprising 
that  this  problem  has  such  widely  differing  deterministic  and  randomized  complexities.  We  show  that 
the  sparseness  assumption  in  the  lower  bounds  is  essential  by  constructing  a  representation  that  is 
optimal  on  dense  dictionaries.  It  requires  0(Iog|U|)  time  per  dictionary  operation  uang  any  update 
strategy. 

Next,  we  apply  unique  representations  to  the  problems  of  equality-testing  of  sets  and  sequences: 

Sequence  equality-testing  problem:  Maintain  a  collection  of  sequences  over  an  ordered  universe 
under  the  operations  i)  Equal(5,T)  -  Return  if  sequences  5  and  T  are  equal,  ii)  lNSERT(5,t,z  J')  - 
Create  a  new  sequence  T  by  inserting  element  x  into  sequence  S  between  positions  t  —  1  and  t,  and 
iii)  DELETE(5,t,T)  •  Create  a  new  sequence  T  by  deleting  the  tth  dement  of  sequence  S.  Initially, 
the  collection  consists  of  only  the  null  sequence. 

Set  equality-testing  problem:  Maintain  a  collection  of  sets  over  an  ordered  universe  under  the 
operations  £qual(5,T),  Insert(5,z,T),  and  DELETE(5,z,r),  defined  in  the  obvious  fashion.  Ini¬ 
tially,  the  collection  consists  of  only  the  empty  set. 

For  both  problems,  we  are  interested  only  in  data  structures  that  test  equality  in  constant  time.  We 
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now  describe  previous  work  on  these  problems.  Csirter  and  Wegman  [17]  proposed  a  randomized 
signature-based  scheme  for  set  equality-testing  that  requires  only  constant  time  per  operation  but 
may  declare  erroneously  that  two  unequal  sets  are  equal  with  a  small  probability.  Pugh  [11]  and  Pugh 
and  Tietelbaum  [12]  gave  randomized  solutions  to  both  problems  that  require  0(log  n)  expected  time 
and  O(log  n)  expected  space  per  update  operation,  where  n  denotes  the  size  of  the  set  or  sequence. 
Their  data  structures  also  support  more  powerful  operations  (for  instance,  union  and  intersection  of 
sets)  efficiently.  However,  for  sequence  equality-testing,  the  logarithmic  bound  is  valid  only  if  the 
sequences  do  not  consist  of  repeated  elements.  Yellin  [19]  described  two  deterministic  solutions  for 
equality-testing  of  sets,  both  of  which  use  space  prohibitively.  His  solutions  assume  that  the  collection 
of  sets  is  fixed  and  that  the  sets  are  updated  destructively. 

We  obtain  the  following  results  on  these  problems.  A  straightforward  solution  to  set  equality¬ 
testing  is  to  represent  sets  by  binary  tries  and  use  Cons  operations  to  update  a  trie.  The  solution 
requires  0((logm)^)  time  and  O(logm)  space  per  update,  where  m  is  the  total  number  of  operations. 
We  reduce  the  time  per  update  in  this  solution  to  0((log  (amortized)  by  developing  a  technique 
for  maintaining  a  dynamic  sparse  array  efficiently.  Fbr  any  fixed  e  >  0,  this  method  yields  a  data 
structure  for  set  equality-testing  requiring  O(logm)  time  and  0(m*)  space  per  update.  For  sequence 
equality-testing,  we  propose  two  solutions.  One  solution  requires  0(v'n  log  m)  amortized  time  and 
0{y/n)  amortized  space  per  update,  while  the  other  requires  0(^(logm)’^^  +  logm)  amortized  time 
and  0(<yn(logm)’/^)  amortized  space.  Here,  n  denotes  the  size  of  the  sequence  being  updated  and 
m  denotes  the  total  number  of  operations.  The  first  solution  is  faster  or  slower  than  the  second, 
depending  on  whether  m  >  2"  or  not.  The  update  time  in  the  first  solution  can  be  further  reduced 
to  0(y/n)  by  increasing  the  space  required  per  update  to  0{y/nm*). 

We  note  an  interesting  connection  between  the  two  problems.  When  sequences  are  free  from 
repetitions,  sequence  equality-testing  is  equivalent  to  set  equality-testing.  This  follows  from  the  rep¬ 
resentation  of  sets  by  sorted  sequences  and  the  representation  of  sequences  by  sets  of  ordered  pairs 
of  adjacent  elements.  Therefore,  sequence  equality-testing  becomes  truly  hard  only  when  sequences 
have  repetition. 

The  above  data  structures  are  based  on  a  method  of  maintaining  a  dynamic  sparse  array  efficiently. 

Dynamic  sparse  array  problem:  Maintain  an  array  of  size  A^,  all  of  whose  entries  are  initially  0. 
efficiently  under  queries  and  updates  of  entries.  The  total  number  of  updates  is  assumed  to  be  very 
small  relative  to  N. 

Dynamic  perfect  hashing  [5]  fpves  a  randomized  solution  requiring  0(1)  rime  per  query  and  0(1) 
randomized  amortized  time  and  0(1)  amortized  space  per  update.  Tsrjan  and  Yao  [16]  give  a  de¬ 
terministic  solution  to  the  problem  when  the  array  is  static,  but,  even  then,  their  preprocessing  time 
(quadratic  in  the  number  of  nonzero  entries)  is  too  expensive  for  our  applications.  We  propose  a 
solution  to  the  problem  that  requires  0(\/log  Jv)  rime  per  query  and  0(>/log)V)  amortized  time  and 
0(1)  amortized  space  per  update.  This  implies  that  the  rime  required  by  our  method  to  store  a 
sparse  array  of  size  having  n  nonzero  entries  is  0(n>/logn),  while  the  space  required  is  0(n).  In 
contrast,  Tarjan  and  Yao*s  static  method  requires  0(n*)  time  and  0(n)  space,  but  supports  constant 
time  queries.  The  rime  per  operation  in  our  solution  can  be  reduced  to  0(1)  by  increaring  the  space 
per  update  to  0{N*). 

The  paper  is  organized  as  follows.  In  Section  2,  we  describe  the  optimal  representation  for  sparse 
dictionaries  that  uses  Cons  operations.  The  lower  bounds  for  representing  sparse  dictionaries  are 
established  in  Section  3.  Section  4  contains  the  optimal  representation  for  dense  dictionaries.  In 
sections  5,  6,  and  7,  respectively,  we  describe  the  data  structures  for  dynamic  sparse  arrays  and  set 
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and  sequence  equality-testing.  The  last  section  concludes  by  posing  open  problems. 

2  An  optimal  representation  for  sparse  dictionaries 

We  describe  a  representation  that  allows  search  in  O(logn)  time  and  permits  updating  the  binary 
search  tree  in  0(y/n)  time  using  Cons  operations.  We  represent  the  dictionary  by  an  almost  complete 
binary  search  tree.  Specifically,  if  n  =  2*‘  -i-  2**  +  •  •  such  that  I'l  >  I'j  >  •  •  •  >  »*,  then  the  binary 

search  tree  representing  an  n-element  dictionary  comprises  a  ib-node  right  path,  with  a  sequence  of  k 
complete  binary  trees  of  respective  sizes  2**  —  1,2**  —  1,...,2**  —  1  hanpng  off  of  it.  See  Figure  2a. 
In  order  to  be  able  to  update  the  tree  efficiently,  we  need  to  maintain  some  additional  information. 
Define  a  j-run  to  be  a  subset  of  2*  —  1  adjacent  elements  in  the  dictionary.  For  each  j  <  |logn/2j, 
we  maintain  complete  binary  trees  representing  the  j-runs  of  the  dictionary,  called  the  j~tree8,  in  a 
sorted,  doubly  linked  list.  See  Figures  2b  and  2c.  When  n  has  the  form  2^  —  2*  +  fc,  where  1  <  k  <  2*, 
in  addition,  we  also  maintain  the  list  of  t-trees  (t  =  (log  n/2j  +  1)  corresponding  to  the  first  k2*  t-runs. 
When  an  insertion  causes  n  to  increase  from  2^'  —  1  to  2^*,  this  list  becomes  the  list  of  [log  n/2j -trees. 

It  is  obvious  that  this  representation  supports  search  in  O(logn)  time.  To  insert  a  new  element  z, 
we  first  update  the  lists  of  trees,  proceeding  bottom  up,  and  then  create  the  rest  of  the  tree  from  trees 
in  the  updated  lists.  Updating  the  list  of  j-trees,  for  any  j,  involves  deleting  at  most  2^  -  2  jf-trees 
corresponding  to  old  y-runs  that  contain  the  predecessor  as  well  as  the  successor  of  x  and  inserting 
in  their  place  at  most  2^-1  j-trees  corresponding  to  new  j-runs  that  contain  z.  Creating  a  new 
j-tree  is  accomplished  in  0(1)  time  by  performing  a  Cons  operation  on  two  (J  -  1)- trees  that  are 
its  subtrees  and  its  root  item.  In  addition,  if  n  is  of  the  form  2^  -  2*  -f  ib,  where  1  <  k  <  2',  we 
need  to  extend  the  size  of  the  list  of  t-trees  from  k2'  to  +  1)2*  by  creating  at  most  2*  new  t-trees. 
The  foregoing  discussion  shows  that  the  list  of  y-trees,  for  any  j,  may  be  updated  in  0(2^)  time  once 
the  updated  list  of  (j  -  1)- trees  is  available.  The  total  cost  of  updating  the  lists  of  trees  is  at  most 
0(2  -i-  2*  -I-  •  •  •  +  2l^°8  **/2j  +* )  ss  0(>/n).  To  create  the  rest  of  the  tree,  we  need  to  create  the  nodes  of 
the  tree  at  levels  [log  r/2J  -I- 1,  [log  n/2j  -I-  2, ... ,  [log  nj  + 1  and  the  nodes  on  the  right  path  of  the  tree. 
These  nodes  can  be  created  bottom  up  from  the  updated  lists  of  trees  by  performing  0(y/n)  Cons 
operations.  It  follows  that  the  cost  of  an  insertion  is  0(>/n).  Deletion  is  analogous. 

This  completes  the  description  of  our  representation.  The  scheme  has  the  drawback  of  using 
n(>/n)  space  per  update.  However,  this  is  inevitable,  for  we  will  see  in  the  next  section  that  any 
unique  binary  search  tree  representation  suffers  from  this  defect  if  the  dictionary  is  sparse. 

3  Lower  bounds  for  sparse  dictionaries 

Snyder's  lower  bound  [15]  of  (i(y/n)  on  the  cost  of  a  dictionary  operation  is  applicable  when  sets  of 
any  particular  cardinality  are  represented  by  a  sin^e  underlying  multiway  tree  and  pointer  changes 
are  used  to  update  the  tree.  Our  lower  bounds  apply  to  the  class  of  unique  binary  search  tree 
representations  in  which  equal-cardinality  sets  are  allowed  to  have  different  underlying  binary  trees. 
We  prove  the  lower  bounds  through  an  interesting  application  of  Ramsey's  theorem  [7,9].  This  method 
can  be  used  to  show  that  Snyder's  lower  bound  holds  even  if  his  restriction  regarding  equal  cardinality 
sets  is  removed.  We  need  the  following  veraon  of  Ramsey's  theorem  to  state  and  to  prove  the  lower 
bound  results: 

Ramsey's  theorem  let  n,ib  and  a  >  n  6e  arbitrory  positive  integers  and  let  U  be  an  arbitrary  set. 
There  exists  a  number  Jln(k,s)  with  the  following  property:  if  |U|  ^  Rn(k,s)  then,  for  any  partition 
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of  Vie  n-subsets  of  U  into  k  classes,  there  is  an  s-suhset  S  all  of  whose  n-subsets  lie  in  a  single  class. 
It  is  known  that  An(k,s)  s  Tn.fi(0(logibs))  [7,9],  where  Tn(x)  is  the  tower  function 


First,  we  state  the  lower  bounds  for  updating  the  tree  by  means  of  rotations  and  pointer  changes. 
Let  bn  =  (^,^)/(n  +  1)  denote  the  number  of  distinct  n-node  binary  trees  [8].  Let  n  and  d  be  positive 
integers  and  let  U  he  an  ordered  universe  of  size  at  least  Itn{bn,n  +  1).  For  any  unique  binary  search 
tree  representation  of  subsets  of  U  we  have  the  following  trade-olTs  between  search  and  update  times: 

Theorem  1  If  the  n-subsets  of  U  are  represented  by  binary  search  trees  of  height  at  most  d,  then 
there  is  an  n-subset  on  which  an  update  operation  requires  (l(n  -  2d)  rotations. 

Theorem  2  If  the  n-subsets  of  U  are  represented  by  binary  search  trees  of  height  at  most  d,  then 
there  is  an  n-subset  on  which  an  update  operation  requires  il{n/d)  pointer  changes. 

The  Ramsey  number  Iin(bn,n  +  1)  is  at  most  Tn+iCcn),  for  some  constant  c.  Therefore  these 
trade-offs  are  valid  when  the  dictionary  size  is  at  most  log*  \U\  —  o(log*  |CA|). 

Theorem  1  implies  that  a  dictionary  operation  requires  tl{n)  time  if  the  dictionary  is  sufficiently 
sparse  and  rotations  are  used  to  update  the  tree.  Theorem  2  implies  that  a  dictionary  operation 
requires  fl(\/n)  time  if  pointer  changes  are  used  to  update  the  tree. 

These  trade-offs  are  actually  realizable.  To  realize  the  trade-off  of  Theorem  2,  we  represent  an 
n-element  set  by  a  complete  binary  tree  of  size  n/d  with  chains  of  length  d  each  attached  at  the 
leaves.  If  the  depth  parameter  d  increases  “smoothly’'  with  n  and  log  n  <  d  <  >/n  always,  then  this 
representation  allows  update  operations  to  be  performed  in  0{n/d)  time.  We  leave  the  representation 
that  realizes  the  trade-off  of  Theorem  1  as  an  (easy)  exercise. 

The  next  theorem  gives  the  lower  bound  on  the  cost  of  updating  the  tree  using  Cons  operations. 

Theorem  S  Let  n  and  m  be  positive  integers  and  let  U  be  an  ordered  universe  of  size  at  least 
il„(6n,m  +  n).  For  any  unique  binary  search  tree  representation  of  subsets  of  U,  there  is  a  sequence  of 
m  update  operations  that  involves  only  subsets  of  size  at  most  n  and  causes  the  creation  of  Q(my/n) 
new  nodes  when  Cons  operations  ore  used  to  implement  updates. 

We  have  Rn(bn,m  +  n)  <  Tn+i{c(n  logm)),  for  some  constant  c.  Hence  the  lower  bound  of 
^l(^/n)  node  creations  per  update  holds  when  n  <  log*  |(f|  >  log*  m  -  o(log*  |C/|). 

We  are  now  ready  to  prove  the  theorems. 

Proof  of  Theorem  1.  By  Ramsey’s  theorem,  there  is  a  subset  5  =  {xi,X3,...,Zn4i}  of  1/ all  of 
whose  n-subsets  are  represented  by  a  single  underlying  binary  tree,  say  B.  We  claim  that  n(n  -  2d) 
roUtions  are  necessary  to  transform  set  Si  »  {xj,X2,...,Xn)  into  set  S2  *  by 

deleting  xi  and  inserting  Xn.fi.  We  prove  the  claim  using  two  lemmas. 

For  any  binary  tree  T,  let  T'  denote  the  binary  tree  with  T  as  left  subtree  and  empty  right  subtree. 
Define  similarly. 

Lemma  1  For  any  binary  tree  T,  at  least  |r(  rotatioru  are  needed  to  transform  into  T^. 
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Proof.  By  induction  on  |T|. 

Case  1.  |r|  =  1:  Easy. 

Case  2.  ITj  >  2:  The  proof  has  the  flavor  of  Wilber’s  method  [18]  of  deriving  lower  bounds  on 
rotations  in  binary  trees.  Consider  an  n*node  binary  tree  B  whose  nodes  are  labeled  from  1  to  n  in 
symmetric  order.  An  interval  [t,  j]  of  nodes  of  B  is  a  block  of  B.  Any  block  [t,i]  of  B  induces  a  binary 
tree,  called  the  block  tree  of  block  [t,  j],  obtained  by  contracting  all  edges  of  B  having  an  endvertex 
outside  the  block.  See  Figure  3.  A  rotation  on  B  propagates  into  a  rotation  on  a  block  tree  of  B  only 
if  both  nodes  of  the  rotation  lie  in  the  block  tree;  otherwise  the  block  tree  is  unaflected. 

Let  Ti  and  T3  denote  the  left  and  right  subtrees  of  T.  Divide  the  nodes  of  and  T**  into  a  left  block 
[1,  (Ti  I  + 1]  and  a  right  block  [|7i  |  +  2,  jTI  + 1).  Then  the  left  block  trees  of  and  T’’  are,  respectively, 
Tj  and  TJ.  See  Figure  4.  By  the  inductive  hypothesis,  at  least  |Ti|  left  block  rotations  must  be 
performed  to  transform  T/  into  TJ.  Similarly,  [TjI  nght  block  rotations  are  needed  to  transform  the 
right  block  tree  Tj  of  into  the  ri^t  block  tree  TJ  of  7^.  Since  the  roots  of  T'  and  7^  belong  to 
different  blocks,  at  least  one  rotation  involving  nodes  from  both  blocks  is  performed  in  transforming 
T'  into  T'.  Any  single  rotation  on  7^  falls  exactly  into  one  of  the  three  categories.  Hence,  in  total,  at 
least  [Til  +  IT3I  +  1  =  |T|  rotations  are  needed  to  transform  T'  into  7^.  □ 

Lemma  2  Lei  r  denote  the  number  of  rotations  required  to  transform  S\  into  S2  by  deleting  Xi  and 
inserting  Xn4i>  Then  B'  can  be  transformed  into  B'  in  2r  -f  2d  -f  1  rotations. 

Proof.  Suppose  that  Si  is  transformed  into  S2  by  performing  the  following  transformations  on 
the  underlying  binary  tree: 

B^Bi  a  B3  B4  S  B 

Here  Oj  denotes  a  sequence  of  rotations.  We  transform  B*  into  B’’  as  follows: 


Rotate  Zn^-i  to 

Rotate  xi  to  the  where  it  is 

root  inserted 

B'  B{  — »  B'f  B^{  —  BJ  B’^ 

The  number  of  rotations  needed  to  move  xi  to  the  root  is  at  most  |9i|  -f  d  + 1.  Likewise,  Xn+i  can  be 
moved  to  the  correct  position  in  losl  +  d  rotations.  Hence  2|<7i|  +  joal  -f  2|(T3|  +  2d+l<2r  +  2d+l 
rotations  suffice  to  transform  B*  into  B'.  □ 

Combining  the  two  lemmas,  it  follows  that  n(n— 2d)  rotations  are  required  to  transform  Si  into  S2 
by  deleting  xi  and  inserting  The  insertion  of  x^+i  into  set  {x3,Z3,...,Xn}  and  the  deletion  of 
Xn4i  from  set  $2  nre  inverse  transformations  and  require  the  same  number  of  pdnter  changes.  Hence 
n(n  —  2d)  rotations  are  necessary  for  dther  the  deletion  of  Xi  from  set  Si  or  the  deletion  of  Xn^.!  from 
set  S2.  □ 

Proof  of  Theorem  2.  The  proof  applies  Ramsey’s  theorem  on  the  n-subsets  of  U  and  then  uses 
Snyder’s  lower  bound  argument  [15]. 

By  Ramsey’s  theorem,  there  is  a  subset  S  s  {xi,X2,...,Xn4.i}  of  U  all  of  whose  n-subsets  are 
represented  by  the  same  underlying  binary  tree.  Let  L  denote  the  subset  of  elements  that  occupy  the 
leaves  in  the  binary  search  tree  representation  of  set  Si  s  {xi  ,X3, . . . ,  Xn}.  Note  that  |L|  >  (n  -  l)/d, 
since  a  binary  tree  with  height  d  and  |L|  leaves  has  at  most  d|I|  -f  1  nodes.  If  we  delete  xi  and 
insert  Xn-t-i  into  this  set,  we  obtain  the  set  S2  =  {x3,X3,...,Xn4i}.  Since  Si  and  S2  have  the  same 
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underlying  binary  tree,  every  element  of  {x^}  occupies  an  internal  node  in  the  binary  search  tree 
representation  of  52.  The  same  argument  shows  that  the  leaf  elements  of  52  occupy  the  internal  nodes 
of  the  binary  search  tree  for  5i.  Therefore  at  least  2\L\  —  2  pointer  changes  are  needed  to  transform 
5i  into  52.  It  follows  that  at  least  |£|  —  1  =  fl{n/d)  pointer  changes  must  be  performed  while  either 
deleting  xi  from  set  5i  or  inserting  Xn^i  into  set  {x2,X3,...,x„}.  The  theorem  is  now  proved  in  the 
same  manner  as  Theorem  1.  □ 

Proof  of  Theorem  S.  By  Ramsey’s  theorem,  there  is  a  subset  5  s  {xi,X2, . . .  ,x,„^.n}  of  V  all  of 
whose  n-subsets  are  represented  by  the  same  underlying  binary  tree.  Let  T  denote  an  n-subset  of  5, 
to  be  specified  later,  and  let  B  be  a  variable  that  initially  denotes  the  binary  search  tree  corresponding 
to  T.  The  lower  bound  construction  repeats  the  following  cycle  of  operations  on  B  until  m  updates 
have  been  performed  on  the  tree: 

Cycle. 

1.  Repeat  ^/n  times: 

Delete  the  smallest  element  from  B  and  insert  a  fresh  element’  from  5  that  is  larger 
than  all  existing  elements  of  B.  Throughout  this  construction,  by  ‘fresh  element’,  we 
mean  an  element  that  has  never  been  in  the  tree  before. 

2.  Call  a  node  of  B  heavy  if  it  has  ^  or  more  descendents,  and  minimally  heavy  if  it 
is  heavy  and  has  no  proper  heavy  descendents.  Let  yi,y2f>  iVJk 

denote,  respectively,  the  minimally  heavy  nodes  of  B  and  their  respective  rightmost 
descendents.  Replace  the  elements  stored  in  nodes  xi,Z2t***«^ik  l>y  Ir^^h  dements 
from  5  that  occupy  the  same  positions  in  the  linear  ordering  of  the  dements  of  the 
dictionary. 

For  the  lower  bound  construction  to  be  valid,  it  is  essential  to  correctly  choose  the  dements  of  the 
initial  set  T  and  the  fresh  dements  that  are  inserted  into  the  tree  each  time.  To  make  a  proper  choice 
for  these  elements,  we  first  determine  thdr  rdative  linear  ordering  by  executing  the  construction, 
treating  the  dements  as  unknowns.  Next,  we  choose  the  dements  from  5,  satisfying  their  rdative 
ordering. 

We  daim  that  each  iteration  of  the  cycle  performs  at  most  4y/n  update  operations  on  the  tree  and 
causes  the  creation  of  at  least  n  —  ^/n  new  nodes.  The  first  part  of  the  daim  is  trivial  since  each  step 
of  the  cyde  performs  at  most  2y/n  updates.  The  second  part  follows  from  two  straightforward  facts: 

1.  There  are  at  least  y/n  —  1  heavy  nodes  in  any  n-node  binary  tree. 

2.  Consider  any  iteration  of  Step  1  of  the  cyde.  The  subtrees  of  the  heavy  nodes  of  B  after  the 
iteration  are  all  different  from  the  subtrees  of  B  at  any  time  prior  to  the  iteration.  Therefore 
the  iteration  recreates  all  heavy  nodes  of  B  by  performing  Cons  operations. 

The  theorem  follows  immediatdy  from  the  daim.  □ 

4  An  Optimal  representation  for  dense  dictionaries 

We  begin  with  the  observation  that  Aragon  and  Sddd’s  representation  [3]  always  creates  trees  of 
hdght  y/\U].  They  choose  a  static,  random  priority  for  each  dement  in  the  universe  and  define  the 
representation  of  a  set  of  items  to  be  the  binary  search  tree  that  is  heap-ordered  according  to  priorities. 
It  is  well  known  [9]  that  any  sequence  of  p’  +  1  numbers  contains  a  monotone  subsequence  of  length 
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p  -f  1.  If  we  choose  the  longest  monotone  subsequence  of  the  sequence  of  priorities  of  elements  in  the 
universe,  then  the  representation  of  the  set  corresponding  to  this  subsequence  has  height  at  least  \/\U\- 
Therefore,  their  representation  always  requires  ft(\/|Z7|)  ^bne  in  the  worst  case  to  perform  a  dictionary 
operation.  Further,  since  it  is  possible  to  assign  priorities  to  elements  of  the  universe  so  that  the  longest 
monotone  subsequence  has  length  at  most  ^U\  +  1,  there  is  a  static  priority  representation  in  which 
the  trees  have  height  0{\/\U\).  This  representation  allows  a  dictionary  operation  to  be  performed  in 
0(N^)time. 

We  shall  now  describe  a  more  efficient  representation  that  allows  a  dictionary  operation  to  be 
performed  in  O(log|l^|)  time.  To  aid  the  description  of  our  representation  it  is  convenient  to  assume 
that  V  —  Il»2**].  The  representation  is  reminiscent  of  binary  tries  [8].  Let  S  denote  the  set  being 
represented.  If  5  C  [1,2’’"'],  then  5  is  represented  in  U  by  the  tree  representing  it  in  the  universe 
[1,2’’"'].  Otherwise,  let  *  =  min  S  HP’’”'  +  1*2’’).  Then,  x  is  the  root  of  the  tree  representing  S  and 
its  left  and  right  subtrees  are,  respectively,  the  trees  representing  subsets  S  0(1*2’’“']  and  S  n(*+ 1*2’’] 
in  the  universes  [1,2’’"']  and  [2’’"'  +  1,2’’].  The  height  of  the  resulting  tree  is  at  most  logll/|.  To 
insert  a  new  element  x,  we  compare  x  with  the  root  of  the  tree,  say  r.  Suppose  that  r  belongs  to 
interval  [2'"'  +  1,2'].  We  distinguish  the  following  cases; 

Case  1.  X  >  2':  Make  x  the  tree  root  with  the  old  tree  as  the  left  subtree. 

Case  2.  r  <  x  <  2':  Insert  x  into  the  right  subtree  recursively. 

Case  3.  2'"'  +  1  <  x  <  r:  Place  x  at  the  tree  root,  substituting  item  r,  and  insert  r  into  the  right 
subtree  recursively. 

Case  4.  X  <  2'"':  Insert  x  into  the  left  subtree  recursively. 

Deletion  is  analogous  to  insertion.  It  is  easy  to  see  that  each  operation  requires  only  0(log  time 
using  any  update  mechanism. 

5  Maintaining  a  dynamic  sparse  array 

First,  we  describe  a  way  of  maintaining  a  dynamic  sparse  array  with  time  per  operation  and 

0{N*)  space  per  update,  for  any  0  <  €  <  1.  We  represent  the  array  by  a  trie  [8]  with  iV'  branches 
per  node.  See  Figure  5.  Each  node  of  the  trie  maintains  a  subset  of  the  array  entries  by  hashing  the 
subset  uniformly  into  N*  buckets.  The  trie  root  maintains  the  entire  array.  If  two  or  more  nonzero 
entries  hash  into  a  bucket,  the  bucket  contains  a  pointer  to  a  child  node  that  recursively  maintains 
the  entries  of  the  bucket.  Otherwise,  either  the  bucket  has  exactly  one  nonzero  entry  or  is  empty.  As 
we  have  described,  the  trie  may  contain  nodes  with  exactly  one  nonempty  bucket  {degree~l  nodes). 
We  save  space  by  shrinking  paths  of  degree-1  nodes  and  labeling  each  shrunk  path  with  the  string  of 
hash  values  it  spells  out.  Since  the  height  of  the  trie  is  0(l/c),  the  cost  of  looking  up  an  array  entry 
is  0(l/c).  To  update  an  entry,  we  first  determine  if  the  entry  is  0.  If  so,  consider  the  lowest  bucket  of 
the  trie  into  wMch  the  entry  hashes.  If  this  bucket  is  empty,  we  simply  add  the  entry  to  the  bucket. 
If  the  bucket  already  contains  a  nonzero  entry,  we  create  a  new  node  storing  both  entries  and  point  to 
the  node  from  the  bucket.  Finally,  if  the  bucket  points  to  another  node,  we  split  the  edge  leaving  this 
bucket  by  creating  a  new  node  that  holds  the  new  entry.  Creating  a  new  node  involves  initializing  N* 
new  buckets,  but  this  can  be  accomplished  in  0(1)  time  using  a  standard  array  initialization  tridi  [1] 
(see  Ex.  2.12,  pi^e  71).  The  other  case  where  we  update  a  nonzero  entry  is  handled  similarly.  Clearly, 
an  update  operation  requires  0(l/c)  time  and  0{N*)  space. 

Next,  we  improve  the  space  per  update  to  (amortized)  by  making  the  time  per  update 

amortized  (instead  of  worst-case)  and  sacrificing  a  constant  factor  in  the  query  time.  Setting  e  = 
l/v'logA’*  we  obtain  a  strategy  with  O(v'logJV)  query  time  and  0(^/\ogN)  amortized  time  and 
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0(1)  amortized  space  per  update.  To  this  end,  we  impose  the  constraint  that  every  node  of  the  trie 
contains  >  2’/*/*'  nonzero  entries.  If  there  are  <  nonzero  entries  totally,  we  store  these  entries  in 
a  balanced  tree  instead  of  a  trie.  We  no  longer  require  that  nodes  have  at  least  two  nonempty  buckets. 
A  bucket  (of  a  node)  that  contains  1  to  2^/^/e  nonzero  entries  stores  these  entries  in  a  balanced  tree, 
ordered  according  to  their  indices  in  the  array.  A  bucket  containing  >  2'^‘/£  nonzero  entries  points 
to  a  child  node  that  recursively  maintains  these  entries.  Looking  up  an  array  entry  involves  finding 
the  lowest  bucket  of  the  trie  into  which  the  entry  hashes  and  searching  for  the  entry  in  the  balanced 
tree  stored  at  the  bucket.  This  process  takes  0(l/()  time.  Updating  an  entry  is  similar,  except  that, 
if  the  number  of  nonzero  entries  in  a  bucket  exceeds  2*^'/^*  create  a  new  child  node  below 

this  bucket  aoid  hash  the  bucket  entries  into  the  buckets  of  new  node.  Creation  of  a  new  node  may 
propagate  at  most  l/c  times. 

Now,  we  show  that  this  data  structure  has  the  claimed  amortized  time  and  space  bounds  per 
update  operation.  Define  the  height  of  an  entry  in  the  trie  to  be  1/c  -  d,  where  d  denotes  the  depth 
of  the  lowest  node  in  the  trie  containing  the  entry.  Each  nonzero  entry  maintains  a  temporal  potential 
equal  to  A,  its  height  in  the  trie,  and  a  spatial  potential  equal  to  ehN*l2^f*.  The  only  expensive 
steps  in  an  update  operation  are  steps  that  create  new  nodes.  The  creation  of  a  new  node  requires 
N*  space  and  time  propotional  to  the  number  of  nonzero  entries  hashing  into  the  node.  Since  the 
creation  of  a  new  node  decreases  the  heights  of  nonzeroentries  hashing  into  that  node,  the  decrease  in 
the  spatial  and  temporal  potentials  of  these  entries,  respectively,  pay  for  the  space  and  time  used  to 
create  the  node.  W'e  charge  the  creation  of  spatial  and  temporal  potentials  for  a  new  nonzero  entry 
to  the  update  operation  that  creates  the  entry.  Thus  an  update  operation  requires  0(l/e)  amortized 
time  and  0(A’‘/2'^‘)  amortized  space. 

This  scheme  works  so  long  as  nonzero  entries  are  not  made  0  through  updates.  When  we  make  a 
nonzero  entry  0,  we  can  just  treat  the  entry  like  a  nonzero  entry  and  leave  it  in  the  trie.  If  we  wish,  in 
order  to  save  space,  we  can  periodically  compact  the  trie  by  eliminating  all  zero  entries.  If  we  perform 
a  compaction  whenever  the  number  of  zero  entries  becomes  a  constant  fraction  of  the  total  number 
of  entries  stored  in  the  trie,  then  it  is  easy  to  show  that  the  amortized  time  per  update  increases  by 
only  a  constant  factor. 

6  Set  Equality-testing 

We  describe  a  data  structure  for  set  equality-testing  based  on  our  binary  search  tree  representation 
for  dense  dictionaries  in  Section  4.  It  is  also  possible  to  base  the  description  of  the  data  structure 
on  binary  trie  representation  of  sets.  We  number  the  elements  seen  so  far  in  serial  order  and  call 
the  collection  of  these  elements,  the  universe.  Every  set  in  the  universe  is  represented  by  a  binary 
search  tree  according  to  this  numbering  as  described  in  Section  4.  The  arrival  of  new  elements  into  the 
universe  during  set  operations  does  not  cause  problems  since  new  elements  are  assigned  higher  serial 
numbers  than  existing  elements.  Two  sets  are  equal  if  and  only  if  their  roots  are  identical.  We  update 
a  set  recursively,  as  described  in  Section  4,  uang  Cons  operations.  The  number  of  Cons  operations 
performed  per  update  is  at  most  logm,  where  m  is  the  number  of  update  operations. 

We  show  how  to  implement  a  Cons  operation  in  0{y/logm)  amortized  time  and  0(1)  amortized 
space,  and  obtain  an  implementation  of  set  updates  in  0((log  m)^/^)  amortized  time  and  0(log  m) 
amortized  space.  A  Cons  operation  determines  whether  a  binary  search  tree  is  present  in  a  collection 
of  trees,  given  its  two  subtrees  and  root  element.  Here,  the  collection  is  all  the  trees  representing 
sets  together  with  their  subtrees.  If  the  tree  is  in  the  collection,  it  is  simply  returned;  otherwise  it  is 
added  to  the  collection  and  then  returned.  We  serially  assign  numbers  to  all  trees  in  the  collection 


10 


according  to  their  order  of  creation.  Associating  with  each  tree  a  triple  (l,r,x)  of  numbers  of  its  two 
subtrees  and  root  element,  we  see  that  a  Cons  operation  is  equivalent  to  asking  for  a  particular  triple 
in  a  collection  of  triples.  If  m  denotes  the  total  number  of  update  operations,  then  the  coordinates  of 
a  triple  corresponding  to  the  subtrees  are  at  most  mlogm  and  the  coordinate  corresponding  to  the 
root  element  is  at  most  m.  We  maintain  the  collection  of  triples  (and  their  corresponding  trees)  in  an 
array  of  size  m^(logm)^.  Using  the  techniques  of  previous  section  to  implement  this  array,  we  see  that 
a  Cons  operation  can  be  implemented  in  0(Vlog  m)  amortized  time  and  0(1)  amortized  space.  This 
description  assumes  an  o  priori  knowledge  of  m.  To  eliminate  this  requirement,  we  guess  m  initially, 
and  double  the  value  of  the  guess  each  time  it  turns  out  to  be  incorrect.  Whenever  we  change  our 
guess,  we  create  a  bigger  sparse  array  and  store  the  collection  of  triples  in  it.  It  is  easy  to  check  that 
this  increases  the  amortized  time  of  an  update  by  only  a  constant  factor. 

7  Sequence  Equality-testing 

In  this  section  we  present  two  data  structures  for  testing  equality  of  sequences. 

7.1  Data  structure  1 

We  represent  a  sequence  by  an  almost  complete  binary  tree  as  described  in  Section  2.  The  tth  node  of 
the  tree  in  symmetric  order  stores  the  tth  element  of  the  sequence.  As  before,  for  each  j  <  [log  n/2j , 
we  maintain  a  list  of  j>trees  of  the  sequence.  One  problem  with  this  representation  is  that  a  node  can 
belong  to  several  sequences  and  hence  to  several  lists  of  i>trees.  Since  a  node  can  have  several  left 
and  right  pointers  corresponding  to  the  lists  of  j-trees  that  contain  it,  how  do  we  navigate  the  list  of 
j-trees  of  a  particular  sequence  efficiently? 

We  solve  this  problem  using  a  result  of  Driscoll  et  al.  [6]  for  making  pointer-based  data  structures 
persistent.  A  data  structure  is  fully  persistent  if  it  maintains  all  versions  of  the  data  structure  created 
by  the  update  operations  so  far  and  permits  accesses  and  updates  (that  create  new  versions)  to  any 
existing  version.  Driscoll  et  al.’s  result  [6]  is  that  any  pointer-based  data  structure  in  which  nodes  have 
constant  bounded  indegree  can  be  made  fully  persistent  at  the  expense  of  weakening  the  time  and 
space  per  update  operation  to  amortized  bounds  (if  the  original  bounds  are  amortized,  they  remain 
amortized).  In  our  data  structure  nodes  have  indegree  at  most  4,  since  a  node  has  at  most  two  parents 
and  at  most  two  adjacent  nodes  in  its  list  of  j-trees  that  point  to  it.  Therefore,  our  data  structure 
has  a  fully  persistent  counterpart  that  maintains  all  sequences  created  so  far  and  permits  accessing 
and  updating  any  of  them. 

One  remaining  problem  is  the  incorporation  of  Cons  operations  into  the  fully  persistent  version  of 
our  data  structure.  We  implement  Cons  operations  as  in  the  previous  section,  by  serially  numbering 
all  the  elements  and  the  trees  and  maintaining  triples  corresponding  to  trees  in  a  sparse  dynamic 
array.  This  gives  rise  to  two  fresh  problems: 

1.  We  are  using  random  access  to  access  nodes  of  the  data  structure  via  Cons  operations,  whereas 
only  purely  pointer-based  data  structures  can  be  made  fully  persistent. 

2.  The  method  used  by  Driscoll  et  al.  to  achieve  full  persistence  is  to  spUt  a  node  when  it  has  to 
maintain  too  many  pointers  corresponding  to  different  versions  of  the  data  structure.  Therefore, 
a  node  of  the  data  structure  that  is  the  root  of  a  single  binary  tree  can  have  many  different 
copies  representing  the  same  tree.  How  do  we  know  that  two  nodes  represent  the  same  tree? 
The  answer  to  this  question  is  required  to  determine  if  two  sequences  are  equal,  given  the  roots 
of  their  trees,  and  to  implement  Cons  operations. 


11 


The  second  problem  is  solved  easily  by  maintaining  in  each  node  the  serial  number  of  the  tree  it 
represents.  Therefore,  to  test  the  equality  of  two  sequences,  we  simply  compare  the  serial  numbers  of 
the  roots  of  their  binary  trees.  When  a  node  is  split,  the  two  resulting  copies  get  its  serial  number. 

We  solve  Problem  1  by  modifying  the  implementation  of  a  Cons  operation  as  follows.  Previously, 
a  Cons  operation  created  a  new  node  corresponding  to  the  root  of  a  tree  only  if  the  tree  was  not 
already  in  the  collection.  Now,  a  Cons  operation  always  creates  a  node  representing  the  root  of  a 
tree  irrespective  of  whether  the  tree  is  in  the  collection  or  not.  If  the  tree  already  exists,  its  serial 
number  is  stored  in  the  newly  created  node  besides  pointers  to  nodes  that  represent  its  two  subtrees. 
Otherwise  the  tree  is  given  the  next  available  serial  number,  its  triple  is  added  to  the  collection  of 
triples,  and  its  number  is  stored  in  the  newly  created  node.  This  solves  problem  1,  since  we  no  longer 
access  nodes  of  the  data  structure  using  random  access.  Random  access  is  used  only  to  obtain  the 
serial  number  of  a  triple.  Since  a  serial  number  is  an  information  field  in  a  node  and  we  are  free  to 
perform  any  computation  on  the  information  Adds  of  nodes  of  the  data  structure,  the  data  structure 
can  be  made  fully  persistent  without  any  difficulty. 

The  fully  persistent  version  of  the  data  structure  solves  the  sequence  equality- testing  problem. 
Now,  we  analyze  its  performance.  The  cost  of  testing  equality  of  two  sequences  is  0(1).  The  time  to 
update  a  sequence  of  length  n  is  O(v^nlogm)  (amortized),  where  m  denotes  the  number  of  update 
operations,  and  the  space  required  is  0(y/n)  (amortized).  If  we  do  not  know  m  in  advance,  as  before 
we  guess  m  and  keep  updating  our  guess,  thereby  creating  bigger  and  bigger  sparse  arrays.  This 
makes  our  bound  on  the  amortized  time  per  update  slightly  worse.  It  is  easy  to  show  that  the  new 
bound  is  O(v'nlogm),  where  n  denotes  the  maximum  length  of  a  sequence. 

7.2  Data  structure  2 

We  combine  the  idea  of  maintaining  lists  of  j-trees  for  a  sequence  with  another  representation  of 
sequences.  We  associate  a  parameter  if  s  2’’ »  1  with  each  sequence,  to  be  specified  later.  For  each 
sequence,  we  maintain  the  lists  of  of  its  ji-trees,  for  j  <  p.  We  store  the  lists  of  j-trees  of  all  sequences 
in  a  fully  persistent  manner  similar  to  Data  structure  1.  We  also  maintain  a  sequence  of  serial  numbers 
of  its  representative  p-trees  in  left-to-right  order.  The  representative  p-trees  of  «  sequence  consist  of 
every  ifth  p-tree  and  the  last  p-tree  of  the  sequence.  The  sequence  of  serial  numbers  is  called  the 
signature  of  a  sequence.  We  store  the  signatures  of  all  sequences  in  a  lexicographic  splay  tree  [13]. 
The  amortized  cost  of  accessing  a  signature  or  inserting  a  new  signature  into  a  lexicographic  splay 
tree  {signature  tree,  henceforth)  is  0(/  +  logN),  where  I  denotes  the  length  of  the  signature  and  jV 
denotes  the  number  of  signatures  in  the  tree.  This  completes  the  representation. 

Two  sequences  are  identical  if  and  only  if  their  signatures  are  identical  and  this  can  be  checked  in 
0(1)  time  if  each  sequence  keeps  track  of  the  location  of  its  signature  in  the  signature  tree.  To  update 
a  sequence,  we  update  the  lists  of  j-trees  of  the  sequence  bottom  up  using  Cons  operations.  This  is 
accomplished  as  in  Data  structure  1.  Then,  we  compute  and  insert  the  signature  of  the  new  sequence 
into  the  signature  tree.  The  amortized  time  required  by  an  update  is  0{K  vlogm-fn/lif +log  m),  where 
n  is  the  length  of  the  updated  sequence  and  m  is  the  total  number  of  update  operations.  This  follows 
from  the  facts  that  updating  the  lists  of  j- trees  requires  0{K)  CoNS  operations  and  that  creation  and 
insertion  of  the  signature  of  a  new  sequence  into  the  signature  tree  requires  0{n/K +logm)  time.  The 
amortized  space  needed  by  an  update  is  0{K  +  nfK).  Setting  if  =  2’’  -  1  >  v/n/(log  m)*/^  >  2’’"^ 
it  follows  that  an  update  requires  0(>/n(iogm)’^^  +  logm)  amortized  time  and  0(^/n(logm)*/^) 
amortized  space. 

We  have  tacitly  ignored  the  problem  that,  when  the  parameter  K  of  a  sequence  suddenly  increases 
due  to  an  insertion,  we  have  to  create  the  entire  list  of  p-trees  for  the  new  sequence.  In  order  to  avoid 
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this  expensive  operation,  for  each  sequence,  we  also  maintain  a  partial  list  of  its  (p  +  l)-trees  and  add 
a  new  (p  +  l)-tree  to  this  list  during  each  insertion,  unless  the  list  is  complete.  Note  that  this  list 
has  to  be  updated  during  an  update  operation.  We  claim  that  whenever  K  increases,  the  complete 
list  of  (p  +  l)-trees  of  the  updated  sequence  is  available.  This  is  because,  for  the  quantity  \/n/  logm 
to  double,  n  must  quadruple.  In  other  words,  we  must  perform  a  series  of  3n  insertions  between  two 
successive  increases  of  parameter  A',  and  after  the  first  n  insertions  the  construction  of  the  list  of 
(p+  l)-trees  is  complete.  Therefore,  when  K  increases  during  an  insertion,  the  list  of  (p^id  +  l)*trees 
of  the  old  sequence  becomes  the  list  of  Pnew-trees  of  the  new  sequence.  It  is  easy  to  check  that  this 
only  increases  the  time  and  space  per  update  by  a  constant  factor. 

Once  again,  we  have  assumed  that  we  know  m  beforehand.  If  this  is  not  the  case,  we  keep  guessing 
m,  and  each  time  we  change  our  guess,  we  increase  the  size  of  the  array  of  triples  and  reexecute  the 
whole  sequence  of  update  operations  so  far  for  the  new  value  of  m.  This  is  necessary  since  the 
parameter  A'  of  a  sequence  depends  on  m.  This  weakens  our  bound  on  the  amortized  time  per  update 
to  0(>/fi(log  +  log  m),  where  n  is  the  maximum  length  of  a  sequence.  . 

8  Conclusion 

Several  interesting  open  problems  are  raised  by  our  work; 

1.  Unique  binary  search  tree  representations:  We  have  seen  some  ways  of  uniquely  rep¬ 
resenting  a  dictionary  by  a  binary  search  tree  that  are  optimal  when  the  dictionary  is  sparse 
(11^1  >  Tn+i  (cn))  or  dense  (|l/l  <  n*).  Deternaine  the  complexity  of  unique  representations  when 
the  dictionary  is  of  intermediate  density  (n^  <  |C/|  <  rn+i(cn)). 

2.  Dynamic  sparse  array  maintenance:  Dynamic  sparse  arrays  are  useful  in  efficiently  imple¬ 
menting  Cons  operations  in  high  level  languages  like  LISP  and  SETL.  It  would  be  of  interest 
to  obtain  sharp  time-space  tradeoffs  on  the  complexity  of  maintaining  a  dynamic  sparse  array. 

3.  Set  and  Sequence  equality>testing:  Obtain  sharp  time-space  tradeoffs  for  set  and  sequence 
equality-testing.  Another  attractive  avenue  of  research  is  to  determine  the  randomized  complex¬ 
ities  of  these  problems. 

4.  More  powerful  data  types  and  operations:  Programming  language  LISP  allows  generalized 
lists  which  are  a  generalization  of  sequences  and  SETL  allows  sets  and  sequences  to  be  themselves 
composed  of  other  sets  and  sequences.  Join  and  split  are  natural  operations  for  sequences  and 
the  natural  operations  for  sets  are  union,  intersection,  and  set  difference.  It  is  a  challen^ng 
problem  to  devise  an  efficient  implementation  of  these  data  types  and  operations.  Since  this 
problem  may  not  have  an  efficient  deterministic  solution,  randomized  solutions  might  be  worth 
exploring. 
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